Skip to content

Fix/memory bloat and stale processes#39

Open
itsuzef wants to merge 38 commits intomainfrom
fix/memory-bloat-and-stale-processes
Open

Fix/memory bloat and stale processes#39
itsuzef wants to merge 38 commits intomainfrom
fix/memory-bloat-and-stale-processes

Conversation

@itsuzef
Copy link
Collaborator

@itsuzef itsuzef commented Mar 18, 2026

No description provided.

AmrDab and others added 30 commits March 10, 2026 00:55
…server, OpenClaw decoupling

Major changes:
- Multi-layer pipeline: L0 (Browser) → L1 (Router) → L1.5 (Deterministic) → L2 (A11y+CDP) → L2.5 (Vision Hints) → L3 (Computer Use)
- Action verifier with ground-truth checking (blocks false success reports)
- A11y click resolver (bounds-based, zero LLM cost)
- CDP integration for browser DOM interaction via Chrome DevTools Protocol
- Deterministic flows for common tasks (email send, app switch)
- Structured task logging (JSONL per task, verified vs unverified success)
- Universal tool server: 33 tools served via REST and MCP from single definitions
- First-run onboarding consent flow
- Workspace state tracker + pluggable task verifiers
- No-progress loop detector and premature-done blocker
- Smart URL preprocessing and content generation prompts
- Error report module (opt-in, redacted, privacy-first)

OpenClaw decoupling:
- Data directory moved from ~/.openclaw/clawd-cursor/ to ~/.clawd-cursor/
- Automatic migration from legacy path on startup
- openclaw-credentials.ts renamed to credentials.ts (source: 'openclaw' → 'external')
- All user-facing messaging is now platform-neutral
- Postbuild no longer auto-registers as OpenClaw skill
- External integrations (OpenClaw, Codex) detected silently as optional

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…no OpenClaw dependency

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…d" section

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- clawdcursor mcp now checks hasConsent() and exits with clear stderr
  message if consent has not been given — no more silent cold-start
- Added writeConsentFile() export to onboarding.ts
- Added --accept flag to clawdcursor start (consent + start in one shot)
- Added clawdcursor consent subcommand with --accept, --revoke, --status
- runOnboarding() now takes a context param ('start' | 'consent') to
  show accurate warning language for each entry point
- start warning now says 'AI Agent + REST API on localhost:3847 — any
  local process can call tool endpoints'
- consent warning shows all three transport modes covered by consent
- Hero: clear tagline, install command, stats (33 tools / 6 layers / 3 transports / any model)
- 'Where is the AI coming from?' split — human with API key vs agent connecting
- Three connect modes with setup snippets: MCP, REST, CLI agent
- Consent section — all four commands (interactive / --accept / --status / --revoke)
- 6-layer pipeline visualization with cost indicators per layer
- 33 tools by category (Desktop, A11y, CDP, Orchestration)
- 8 provider cards with text/vision model routing
- v0.6.3 vs v0.7.0 comparison table
- Agent-readable HTML comment block at top for LLM crawlers
- Responsive, matches v0.6.3 design language (dark, Inter, green accent)
- Remove comparison table
- Keep same CSS variables, component patterns, AI cursor animation
- Same mode-card / feat-card / code-box / os-tab patterns from v0.6.3
- Three tabs: MCP Client / CLI Agent / REST (replaces OS tabs)
- Who section: human vs agent split as first decision
- Pipeline: 4-card grid (L0/L1/L2/L3 — simplified)
- What's New: 6 feat-cards, reliability-focused
- CTA: 'Give your AI a body'
- Cleaner, simpler, consistent with existing site
doctor.ts:
- Remove 'Registered as OpenClaw skill' console output in v0.7.0
  (clawd-cursor is standalone — external skill link is silent/optional)

website:
- Hero: 'Give your AI eyes and hands' — no transport jargon
- Who section: use cases first (tell it what to do / connect your AI)
  not 'where is the AI coming from?' technical framing
- Mode cards: Claude Code / Give it tasks / Build with it
- Install tabs: Claude Code / Cursor | Give it tasks | Build with it
- Strip MCP/REST/CLI labels from user-facing copy where not needed
verifiers.ts (full rewrite):
- Default is FAIL/UNCERTAIN, not PASS — no more silent auto-pass on unrecognized tasks
- Error passthrough is FAIL, not PASS — broken checks are never invisible
- Primary verifier: text LLM reads a11y tree + active window + focused element,
  must cite specific screen evidence to return PASS
- LLM verdict requires confidence >= 0.65 AND explicit evidence citation
- UNCERTAIN verdict → treated as FAIL (conservative)
- Fast-path heuristics (app_open, clipboard, navigation) still run first for
  zero-cost trivial checks, but only trusted at confidence >= 0.75/0.80/0.85
- Every check attempt logged as VerifyAttempt: checkName, pass, confidence,
  detail, durationMs, optional error
- Full attemptLog returned on every VerifyResult for caller to log

a11y-reasoner.ts:
- Logs every individual verify attempt to console with checkName/pass/conf/ms
- Logs evidence string when present
- Verifier errors are BLOCKING (not silent) — thrown exception = FAIL
- Full attemptLog written into logStep verification.detail for JSONL record
- uiStateSummary now includes per-check pass/fail/confidence summary

agent.ts:
- Pass pipelineConfig to TaskVerifier so LLM verifier has text model access
Block characters (█ ╗ ║ etc.) were getting garbled on Windows during
npm install due to code-page encoding issues. Replaced all Unicode
block/box-drawing chars with explicit \\uXXXX escape sequences.
These survive any encoding transform and render correctly in any
Windows terminal with UTF-8 support.
Banner now shows exactly once — during the first-run consent flow
in runOnboarding(). Every new user sees it. Returning users get a
clean single-line status: 'clawd-cursor v0.7.0 — desktop control
active on localhost:3847'.

- onboarding.ts: added printBanner(), called at top of runOnboarding()
  before the consent warning; also replaced raw box-drawing chars with
  unicode escapes in the consent box
- index.ts: removed full banner from start command; replaced with a
  compact one-liner status message
…lti-monitor, macOS perms

server.ts:
- Bearer token auth: generated at startup, saved to ~/.clawd-cursor/token (mode 0o600)
- Token printed to console on start alongside the server URL
- requireAuth middleware applied to all mutating + sensitive endpoints:
  POST /task, /action, /confirm, /abort, /favorites, /report, /stop
  GET  /screenshot
- CORS middleware: blocks cross-origin browser requests (SSRF/localhost-bypass
  prevention); only localhost:3847 origin is allowed; API callers without an
  Origin header (curl, CLI, MCP) pass through unaffected
- Favorites path moved from process.cwd() to ~/.clawd-cursor/ (persists across cwd)
- Imports DATA_DIR from paths.ts

agent.ts:
- Global 10-minute wall-clock timeout on executeTask() via Promise.race
- Timeout sets this.aborted = true so loops can exit cleanly
- Internal pipeline moved to _executeTaskInternal() — public API unchanged

safety.ts:
- type actions in terminal contexts (cmd, powershell, bash, wt, etc.) now
  require Confirm tier instead of Preview — keystrokes in a terminal can
  execute arbitrary shell commands

doctor.ts:
- macOS: added Screen Recording + Accessibility permission checks before the
  a11y bridge test; uses screencapture dry-run and osascript UI elements check;
  clear error messages linking to System Settings location

native-desktop.ts:
- Added MonitorInfo interface
- Added getMonitors(): enumerates all monitors on Windows (PowerShell
  System.Windows.Forms.Screen), macOS (osascript), Linux (xrandr)
- Added captureMonitor(index): captures a specific monitor region using
  nut-js screen.grabRegion() + sharp resize; falls back to primary on error

paths.ts:
- Added FAVORITES_PATH and TOKEN_PATH exports
…dinate scaling

Files:
- src/__tests__/coordinate-scaling.test.ts (14 tests) — pure math, zero deps
  Scale factor computation, LLM→real coord mapping, multi-monitor offsets

- src/__tests__/safety.test.ts (16 tests) — full SafetyLayer coverage
  Terminal type→Confirm tier (powershell/cmd/bash/wt), non-terminal→Preview,
  blocked patterns, confirm patterns, auto tier, isBlocked()

- src/__tests__/verifiers.test.ts (9 tests) — TaskVerifier behavior
  attemptLog always populated, error=FAIL not PASS, appOpen fast-path,
  clipboard fast-path pass/fail, graceful fallback without pipelineConfig

- src/__tests__/action-router.test.ts (16 tests) — ActionRouter routing logic
  Multi-step compound task rejection (5 cases), type routing (typeText call),
  URL navigation detection, write≠type, telemetry counting/reset

- vitest.config.ts — test runner config, node env, src/__tests__ glob

All tests mock nut-js and sharp to avoid native binary requirements.
55/55 passing, 0 failures.
… Groq, Llama-vision, xAI...)

generic-computer-use.ts (new file):
- Full screenshot → action → screenshot loop using OpenAI function-calling format
- Works with any provider that has a vision model + OpenAI-compat API
- DESKTOP_ACTION_TOOL: single structured tool with discriminated union of action types
  (screenshot, click, double_click, right_click, type, key, scroll, drag, move, wait, done)
- tool_choice: 'required' — LLM must always return a tool call, never prose-only
- Anti-loop guard: blocks LLM from taking >3 consecutive screenshots without acting
- Safety check on every action via SafetyLayer (blocked actions returned as tool error)
- Ground-truth verification on 'done' claims — verifier failure feeds back to LLM
- Coordinate scaling: LLM-space → real screen via same scale factor as other layers
- A11y tree (getScreenContext) injected into screenshot result for grounding
- 2-minute per-call timeout, 25 iteration max, graceful error passthrough
- isGenericComputerUseSupported(): checks vision model + API key, excludes Anthropic
  (which has its own native CU implementation)

agent.ts:
- Import GenericComputerUse and isGenericComputerUseSupported
- Add genericComputerUse field alongside computerUse
- Constructor: init genericComputerUse when Anthropic CU not available but vision key exists
- L3 dispatch updated with 3-tier cascade:
  1. this.computerUse   → Anthropic native Computer Use (tool spec, beta headers)
  2. this.genericComputerUse → Generic OpenAI-compat loop (GPT-4o, Gemini, Groq, etc.)
  3. executeLLMFallback → Legacy vision fallback (no structured tool schema, kept for compat)
- Step label updated: 'Layer 3 (Anthropic)' / 'Layer 3 (Generic)' / 'Layer 3 (legacy)'

providers.ts:
- Added Gemini (generativelanguage.googleapis.com/v1beta/openai, gemini-2.0-flash)
- Added Mistral AI (pixtral-large-latest for vision)
- Added xAI/Grok (grok-2-vision-1212)
- Key auto-detection: AIza → gemini, xai- → xai
- All three are openaiCompat: true — they all work with the generic CU loop
…resort

cdp-driver.ts:
- connect(): filter known OEM/vendor widget URLs (Lenovo Vantage, MSN/Bing widgets,
  NTP pages) in addition to edge:// and chrome:// — these are https:// URLs but
  behave as system pages with JS disabled, causing the agent to get stuck
- Among remaining user pages, pick the last one (most recently opened/navigated)
  instead of any arbitrary real page

a11y-reasoner.ts:
- Last-resort tab recovery: instead of window.location.href on the broken page
  (which fails when JS is disabled), open a fresh new tab via context.newPage()
  and navigate there, then attachToPage() the new tab
- This correctly escapes frozen/JS-disabled OEM widget tabs
- Added cdp_scroll handling in the CDP action dispatch block
  alongside cdp_click, cdp_type, cdp_read_text
- Supports direction (up/down/left/right), amount (px), optional selector
- Falls back to key_press ArrowDown on error rather than failing hard
- Added cdp_scroll examples to the L2 system prompt so the LLM knows
  it can scroll web pages natively without using keyboard shortcuts

Tested: successfully scrolled Reddit front page, found and clicked
upvote button via cdp_click by_text='upvote'. Task correctly escalated
to needs_human when Reddit login wall appeared (expected — not a bug).
…owerShell

Implements the OcrEngine class (src/ocr-engine.ts) that provides OS-level
OCR with bounding boxes in real screen pixels. This is the foundation for
the OCR-first architecture in v0.8.0, replacing the a11y tree as the
primary UI read layer.

- Windows: PowerShell script using Windows.Media.Ocr WinRT API
- macOS/Linux: graceful stub (isAvailable() returns false)
- 300ms result cache with invalidateCache() for action dirty-bit
- 20 new unit tests (all 75 tests pass, 0 TS errors)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
PowerShell's ConvertTo-Json can leave unescaped control characters
(e.g. bell \x07 from OCR'd icon text) that break JSON.parse().
Strips them in both the PS script and the TS layer for defense-in-depth.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…uts_execute)

Bridges the gap between the internal ActionRouter (which fuzzy-matches
tasks against 68 keyboard shortcuts) and external MCP agents that
previously had to independently know key combos. Two new tools:
  - shortcuts_list: query shortcuts by category and/or app context
  - shortcuts_execute: run a shortcut by intent with fuzzy matching

15 new tests, all 90 pass. Tool count: 33 → 35.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Complete v0.8.0 architecture — OCR is now the primary read layer:

  L0: LocalTaskParser (regex, no LLM)
  L1: ActionRouter (shortcuts, deterministic)
  L1.5: SmartInteraction (CDP + UIDriver)
  L2: SkillCache — replays learned task paths (grows over time)
  L2.5: OcrReasoner — OCR + a11y tree + text LLM (primary path)
  L3: VisionLLM — fallback only (unchanged)

Key design: OCR snapshot includes BOTH OCR text AND a11y tree, so if
OCR+A11y combined can't handle it, skip straight to vision (per user
requirement — no separate A11y-only fallback step).

New files:
  - src/ocr-reasoner.ts — L2.5 loop (OCR → text LLM → action → verify)
  - src/skill-cache.ts — learns from 2+ successful runs, auto-promotes
  - src/tools/ocr.ts — ocr_read_screen MCP tool (36 tools total)

Modified:
  - src/agent.ts — SkillCache (L2) + OcrReasoner (L2.5) inserted
  - src/providers.ts — ocrEnabled, skillCacheEnabled config flags
  - src/tools/index.ts — registered OCR tools

All 90 tests pass, 0 TypeScript errors. Live OCR tested successfully.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…rt_read, smart_type, invoke_element)

These 4 tools let MCP clients interact with the desktop WITHOUT screenshots
or coordinate math. They use a11y → CDP → OCR fallback chains:

- smart_read: primary perception tool, reads screen via a11y/CDP/OCR
- smart_click: click by element name, no coordinates needed
- smart_type: type into element by name with auto-focus
- invoke_element: direct UIA invoke with set-value/get-value/focus support

Key finding: a11y coordinates and nut-js mouseClick share the same coordinate
system — no conversion needed. The smart tools pass coords directly, avoiding
the broken a11yToMouse() dpiRatio division.

Also improved ocr_read_screen with dpiRatio hint for manual coordinate math.
Total: 40 MCP tools. 112 tests passing (22 new smart tool tests).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…lement

smart_read: runs OCR and a11y in parallel via Promise.all(), OCR text
shown first with a11y tree appended as supplement section.

smart_click: OCR scan + a11y invoke run in parallel; a11y invoke wins
if it succeeds (most reliable OS-level click), otherwise OCR coordinate
click, then a11y bounds fallback, then CDP as last resort.

Tests updated for OCR-first expectations (112/112 pass).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ort fix

- tool-server.ts: validate request params against tool schema before execution
- safe-json.ts: balanced-brace JSON extraction replaces greedy regex across
  ai-brain.ts, smart-interaction.ts (prevents malformed LLM response crashes)
- ps-runner.ts: cap command queue at 100 (backpressure on long sessions)
- a11y-reasoner.ts, cdp-driver.ts, browser-layer.ts: replace silent catch {}
  blocks with console.debug logging for debuggability
- onboarding.ts + index.ts: honor custom --port in consent text and startup log
- docs/v0.7.0/index.html: update to 40 tools, OCR-first pipeline, smart tools

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Text model: sends "Reply with exactly: CLAWD_OK", verifies response
  contains the token (catches HTML error pages, quota errors, broken endpoints)
- Vision model: sends 8x8 green PNG image, verifies non-empty response
  (catches text-only endpoints that would fail at runtime in Layer 3)
- Smoke test phase: a11y reads active window title → sends to LLM → verifies
  round-trip (catches pipeline wiring bugs between perception and reasoning)
- Timeout reduced from 15s to 8s for text, 10s for vision
- extractErrorMessage() shared helper for consistent error formatting

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…esults table

SKILL.md rewritten per task brief — explicit tool decision trees,
sensitive app policy, error recovery table, canvas app patterns.
537 lines, framework-agnostic. README test results table removed.
index.ts: deduplicate createToolContext() shared by mcp + serve.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…RM64)

- macOS OCR via Apple Vision framework (VNRecognizeTextRequest) — Swift script
- Linux OCR via Tesseract — Python script with CLI and pytesseract fallback
- macOS getFocusedElement via JXA — was returning null, now reads focused UI element
- ocr-engine.ts routes to platform-specific implementation at runtime
- All changes additive — zero impact to existing Windows code paths
- Platform support tables updated in README.md and SKILL.md

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
AmrDab and others added 8 commits March 16, 2026 20:19
…d, root cleanup

BLOCKERS FIXED:
- task/stop/kill CLI commands now send Bearer auth token (were always 401)
- serve mode now generates + displays auth token, protects POST endpoints
- --skip-consent restricted to NODE_ENV=development

HIGH PRIORITY:
- Emoji gate: all console emoji wrapped in e(emoji, fallback) for Windows
  terminals not in UTF-8 mode. Shared via src/format.ts
- .claude/ added to .gitignore (worktrees were polluting git)
- Root clutter moved: test-*.{sh,js} → tests/, *.md → docs/
- CLAUDE_v0.8.0.md deleted (spec absorbed into codebase)
- All "v0.8.0" comments in src/ rebranded — OCR + SkillCache are v0.7.0

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- npm package name: clawdcursor (was clawd-cursor)
- bin command: clawdcursor only (removed clawd-cursor alias)
- Data directory: ~/.clawdcursor/ (was ~/.clawd-cursor/)
- Migration: paths.ts auto-migrates from ~/.clawd-cursor/ and
  ~/.openclaw/clawd-cursor/ on first run
- All 22 files updated: src/, docs/, README, SKILL.md, website,
  package.json, CHANGELOG, scripts
- Display name "Clawd Cursor" (with space) unchanged — it's the brand

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- .clawd-config.json → .clawdcursor-config.json (config file name)
- .clawd-favorites.json → .clawdcursor-favorites.json
- clawd-task-* → clawdcursor-task-* (temp file prefixes)
- clawd-ocr-* → clawdcursor-ocr-* (temp file prefixes)
- clawd-edge → clawdcursor-edge (Edge user data dir)
- Git remote updated: AmrDab/clawdcursor.git
- .gitignore covers both old and new config file names
- Tests updated to match new naming

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…legate_to_agent auth

- POST /execute/* now requires Bearer token (was completely unprotected)
- Token generation moved to lazy init — stops CLI commands (stop, task, consent)
  from overwriting the running server's token on import
- delegate_to_agent and abort calls now include auth headers
- Website install instructions updated: npm install → git clone + npm run setup

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
// URL-based section selection — most accurate for browser tabs
if (currentUrl) {
const url = currentUrl.toLowerCase();
if (url.includes('google.com/travel/flights') || url.includes('flights.google.com')) {

Check failure

Code scanning / CodeQL

Incomplete URL substring sanitization High

'
flights.google.com
' can be anywhere in the URL, and arbitrary hosts may come before or after it.

Copilot Autofix

AI 1 day ago

In general, to fix incomplete URL substring sanitization you should parse the URL with a proper URL parser and inspect its hostname (or host) instead of doing substring searches over the entire URL string. For matching a known domain and its subdomains, you should require that the hostname is either exactly the expected domain or ends with . + expected domain, not just that it “includes” the domain string.

In this file, the best fix is to parse currentUrl using the standard URL constructor, extract hostname, and then perform robust domain checks against that hostname. We should replace the url.includes(...) checks under the // URL-based section selection — most accurate for browser tabs comment with hostname-aware checks. To preserve behavior of also matching content-specific paths (like google.com/travel/flights vs other Google URLs), we can still use url.includes('google.com/travel/flights') for the path-based part (since that does not affect host validation) but replace the pure domain checks with hostname comparisons.

Concretely, within src/a11y-reasoner.ts around lines 215–227:

  • Introduce a small helper inside that block to safely parse the URL and derive hostname and maybe pathname.
  • Replace:
    • url.includes('flights.google.com') with a hostname check like hostname === 'flights.google.com'.
    • url.includes('tripadvisor.com') with hostname === 'tripadvisor.com' or a controlled subdomain match (if you want to support www.tripadvisor.com, etc.). To keep behavior reasonably broad while still safe, we can allow hostname === 'tripadvisor.com' || hostname.endsWith('.tripadvisor.com').
    • url.includes('docs.google.com') similarly with hostname === 'docs.google.com' or .endsWith('.docs.google.com') as appropriate (though in practice docs.google.com typically has no subdomains).
  • Keep the existing url.includes('google.com/travel/flights') because it’s a path-based heuristic and not the vulnerable domain check (and it’s combined with host-aware checks for other domains).

We can implement this using the built-in URL class available in Node.js and modern browsers; no new imports are needed. We just need to add a small try/catch for malformed URLs and fall back gracefully.

Suggested changeset 1
src/a11y-reasoner.ts

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/src/a11y-reasoner.ts b/src/a11y-reasoner.ts
--- a/src/a11y-reasoner.ts
+++ b/src/a11y-reasoner.ts
@@ -215,13 +215,35 @@
         // URL-based section selection — most accurate for browser tabs
         if (currentUrl) {
           const url = currentUrl.toLowerCase();
-          if (url.includes('google.com/travel/flights') || url.includes('flights.google.com')) {
+          let hostname: string | null = null;
+          try {
+            const parsed = new URL(currentUrl);
+            hostname = parsed.hostname.toLowerCase();
+          } catch {
+            // If URL parsing fails, fall back to simple string heuristics on the full URL
+          }
+
+          // Google Flights: match either the specific travel path on google.com or the flights.google.com host
+          if (
+            url.includes('google.com/travel/flights') ||
+            (hostname !== null && hostname === 'flights.google.com')
+          ) {
             if (title.includes('google flights')) includeSection = true;
           }
-          if (url.includes('tripadvisor.com')) {
+
+          // TripAdvisor: match main domain or its subdomains
+          if (
+            hostname !== null &&
+            (hostname === 'tripadvisor.com' || hostname.endsWith('.tripadvisor.com'))
+          ) {
             if (title.includes('tripadvisor') || title.includes('google flights')) includeSection = true;
           }
-          if (url.includes('docs.google.com')) {
+
+          // Google Docs: match docs.google.com host (and allow subdomains if ever used)
+          if (
+            hostname !== null &&
+            (hostname === 'docs.google.com' || hostname.endsWith('.docs.google.com'))
+          ) {
             if (title.includes('google docs')) includeSection = true;
           }
         } else if (processName === 'msedge') {
EOF
@@ -215,13 +215,35 @@
// URL-based section selection — most accurate for browser tabs
if (currentUrl) {
const url = currentUrl.toLowerCase();
if (url.includes('google.com/travel/flights') || url.includes('flights.google.com')) {
let hostname: string | null = null;
try {
const parsed = new URL(currentUrl);
hostname = parsed.hostname.toLowerCase();
} catch {
// If URL parsing fails, fall back to simple string heuristics on the full URL
}

// Google Flights: match either the specific travel path on google.com or the flights.google.com host
if (
url.includes('google.com/travel/flights') ||
(hostname !== null && hostname === 'flights.google.com')
) {
if (title.includes('google flights')) includeSection = true;
}
if (url.includes('tripadvisor.com')) {

// TripAdvisor: match main domain or its subdomains
if (
hostname !== null &&
(hostname === 'tripadvisor.com' || hostname.endsWith('.tripadvisor.com'))
) {
if (title.includes('tripadvisor') || title.includes('google flights')) includeSection = true;
}
if (url.includes('docs.google.com')) {

// Google Docs: match docs.google.com host (and allow subdomains if ever used)
if (
hostname !== null &&
(hostname === 'docs.google.com' || hostname.endsWith('.docs.google.com'))
) {
if (title.includes('google docs')) includeSection = true;
}
} else if (processName === 'msedge') {
Copilot is powered by AI and may make mistakes. Always verify output.
if (url.includes('google.com/travel/flights') || url.includes('flights.google.com')) {
if (title.includes('google flights')) includeSection = true;
}
if (url.includes('tripadvisor.com')) {

Check failure

Code scanning / CodeQL

Incomplete URL substring sanitization High

'
tripadvisor.com
' can be anywhere in the URL, and arbitrary hosts may come before or after it.

Copilot Autofix

AI 1 day ago

In general, to fix incomplete URL substring sanitization, you should parse the URL using a proper URL parser and then perform checks against structured components like hostname and pathname, instead of using .includes() on the full URL string. For domain checks, compare against the hostname (and possibly its subdomains) in a precise way; for path checks, use normalized path strings or regular expressions anchored appropriately.

For this specific code block in src/a11y-reasoner.ts, the goal is to keep existing behavior (selecting suitable help sections) while replacing naive url.includes(...) checks with structured checks. We can do this by:

  • Constructing a URL instance from currentUrl (using the standard WHATWG URL class available in Node).
  • Extracting hostname and pathname in lowercase.
  • Rewriting:
    • Google Flights logic to check:
      • hostname is flights.google.com, or
      • hostname ends with .google.com or equals google.com and pathname begins with /travel/flights.
    • TripAdvisor logic to check:
      • hostname is tripadvisor.com or ends with .tripadvisor.com.
    • Google Docs logic to check:
      • hostname is docs.google.com or ends with .docs.google.com (for completeness).

Because this is not security‑critical routing but feature selection, we can be slightly permissive (allowing subdomains) while eliminating misleading matches in query strings or unrelated parts of the URL. We should also wrap new URL(currentUrl) in a try/catch so that malformed URLs do not throw and instead cause the code to fall back to current behavior (includeSection remains as decided by earlier conditions).

Concretely:

  • Inside the if (currentUrl) { ... } block (around lines 216–227), replace the substring checks with code that:
    • Initializes local hostname and pathname using new URL(currentUrl), falling back to the raw url string if parsing fails.
    • Uses hostname/path checks as described above to set includeSection.
  • No new imports are needed because URL is global in modern Node; and we were instructed not to touch other parts/imports unless necessary.

Suggested changeset 1
src/a11y-reasoner.ts

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/src/a11y-reasoner.ts b/src/a11y-reasoner.ts
--- a/src/a11y-reasoner.ts
+++ b/src/a11y-reasoner.ts
@@ -215,14 +215,43 @@
         // URL-based section selection — most accurate for browser tabs
         if (currentUrl) {
           const url = currentUrl.toLowerCase();
-          if (url.includes('google.com/travel/flights') || url.includes('flights.google.com')) {
-            if (title.includes('google flights')) includeSection = true;
+          let hostname: string | undefined;
+          let pathname: string | undefined;
+          try {
+            const parsed = new URL(currentUrl);
+            hostname = parsed.hostname.toLowerCase();
+            pathname = parsed.pathname.toLowerCase();
+          } catch {
+            // Fallback: treat the whole lowercased string as hostname surrogate
+            hostname = url;
+            pathname = '';
           }
-          if (url.includes('tripadvisor.com')) {
+
+          // Google Flights — either dedicated subdomain or /travel/flights path on google.com
+          const isGoogleFlightsHost =
+            hostname === 'flights.google.com' ||
+            (hostname !== undefined &&
+              (hostname === 'google.com' || hostname.endsWith('.google.com')) &&
+              pathname !== undefined &&
+              pathname.startsWith('/travel/flights'));
+          if (isGoogleFlightsHost && title.includes('google flights')) {
+            includeSection = true;
+          }
+
+          // TripAdvisor — main domain or any subdomain
+          const isTripadvisorHost =
+            hostname === 'tripadvisor.com' ||
+            (hostname !== undefined && hostname.endsWith('.tripadvisor.com'));
+          if (isTripadvisorHost) {
             if (title.includes('tripadvisor') || title.includes('google flights')) includeSection = true;
           }
-          if (url.includes('docs.google.com')) {
-            if (title.includes('google docs')) includeSection = true;
+
+          // Google Docs — docs.google.com or its subdomains
+          const isGoogleDocsHost =
+            hostname === 'docs.google.com' ||
+            (hostname !== undefined && hostname.endsWith('.docs.google.com'));
+          if (isGoogleDocsHost && title.includes('google docs')) {
+            includeSection = true;
           }
         } else if (processName === 'msedge') {
           // Fallback when URL unknown: include both for msedge
EOF
@@ -215,14 +215,43 @@
// URL-based section selection — most accurate for browser tabs
if (currentUrl) {
const url = currentUrl.toLowerCase();
if (url.includes('google.com/travel/flights') || url.includes('flights.google.com')) {
if (title.includes('google flights')) includeSection = true;
let hostname: string | undefined;
let pathname: string | undefined;
try {
const parsed = new URL(currentUrl);
hostname = parsed.hostname.toLowerCase();
pathname = parsed.pathname.toLowerCase();
} catch {
// Fallback: treat the whole lowercased string as hostname surrogate
hostname = url;
pathname = '';
}
if (url.includes('tripadvisor.com')) {

// Google Flights — either dedicated subdomain or /travel/flights path on google.com
const isGoogleFlightsHost =
hostname === 'flights.google.com' ||
(hostname !== undefined &&
(hostname === 'google.com' || hostname.endsWith('.google.com')) &&
pathname !== undefined &&
pathname.startsWith('/travel/flights'));
if (isGoogleFlightsHost && title.includes('google flights')) {
includeSection = true;
}

// TripAdvisor — main domain or any subdomain
const isTripadvisorHost =
hostname === 'tripadvisor.com' ||
(hostname !== undefined && hostname.endsWith('.tripadvisor.com'));
if (isTripadvisorHost) {
if (title.includes('tripadvisor') || title.includes('google flights')) includeSection = true;
}
if (url.includes('docs.google.com')) {
if (title.includes('google docs')) includeSection = true;

// Google Docs — docs.google.com or its subdomains
const isGoogleDocsHost =
hostname === 'docs.google.com' ||
(hostname !== undefined && hostname.endsWith('.docs.google.com'));
if (isGoogleDocsHost && title.includes('google docs')) {
includeSection = true;
}
} else if (processName === 'msedge') {
// Fallback when URL unknown: include both for msedge
Copilot is powered by AI and may make mistakes. Always verify output.
if (url.includes('tripadvisor.com')) {
if (title.includes('tripadvisor') || title.includes('google flights')) includeSection = true;
}
if (url.includes('docs.google.com')) {

Check failure

Code scanning / CodeQL

Incomplete URL substring sanitization High

'
docs.google.com
' can be anywhere in the URL, and arbitrary hosts may come before or after it.

Copilot Autofix

AI 1 day ago

Generally, to fix this kind of problem you should parse the URL using a proper URL parser (for example the built‑in URL class in modern Node/TypeScript), then perform checks against well-defined components such as hostname and pathname instead of using includes on the entire string.

For this specific case in src/a11y-reasoner.ts, we only need to refine the logic inside the if (currentUrl) { ... } block around lines 216–226. We can parse currentUrl once into a URL object and then:

  • For Google Flights: require hostname to be either google.com with a /travel/flights path prefix, or flights.google.com (any path).
  • For TripAdvisor: require hostname to be tripadvisor.com or a subdomain of it.
  • For Google Docs: require hostname to be docs.google.com (optionally allowing subdomains if desired), instead of url.includes('docs.google.com').

We can wrap parsing in a try/catch to avoid throwing on malformed URLs and, on failure, fall back to the previous substring behavior to avoid changing functionality too drastically. No new imports are required: the global URL class is available in modern Node.js and TypeScript DOM lib types. The actual code change will be to replace the const url = currentUrl.toLowerCase(); block and the subsequent if (url.includes(...)) checks with logic that uses new URL(currentUrl) and then conditions based on hostname and pathname. All changes remain within the provided snippet in src/a11y-reasoner.ts.

Suggested changeset 1
src/a11y-reasoner.ts

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/src/a11y-reasoner.ts b/src/a11y-reasoner.ts
--- a/src/a11y-reasoner.ts
+++ b/src/a11y-reasoner.ts
@@ -214,16 +214,45 @@
 
         // URL-based section selection — most accurate for browser tabs
         if (currentUrl) {
-          const url = currentUrl.toLowerCase();
-          if (url.includes('google.com/travel/flights') || url.includes('flights.google.com')) {
-            if (title.includes('google flights')) includeSection = true;
+          let hostname = '';
+          let pathname = '';
+          try {
+            const parsed = new URL(currentUrl);
+            hostname = parsed.hostname.toLowerCase();
+            pathname = parsed.pathname.toLowerCase();
+          } catch {
+            // Fallback to original behavior on malformed URLs
+            const url = currentUrl.toLowerCase();
+            if (url.includes('google.com/travel/flights') || url.includes('flights.google.com')) {
+              if (title.includes('google flights')) includeSection = true;
+            }
+            if (url.includes('tripadvisor.com')) {
+              if (title.includes('tripadvisor') || title.includes('google flights')) includeSection = true;
+            }
+            if (url.includes('docs.google.com')) {
+              if (title.includes('google docs')) includeSection = true;
+            }
           }
-          if (url.includes('tripadvisor.com')) {
-            if (title.includes('tripadvisor') || title.includes('google flights')) includeSection = true;
+
+          if (hostname) {
+            // Google Flights: google.com/travel/flights or flights.google.com
+            if (
+              (hostname === 'google.com' && pathname.startsWith('/travel/flights')) ||
+              hostname === 'flights.google.com'
+            ) {
+              if (title.includes('google flights')) includeSection = true;
+            }
+
+            // TripAdvisor: tripadvisor.com or any subdomain
+            if (hostname === 'tripadvisor.com' || hostname.endsWith('.tripadvisor.com')) {
+              if (title.includes('tripadvisor') || title.includes('google flights')) includeSection = true;
+            }
+
+            // Google Docs: docs.google.com or any subdomain
+            if (hostname === 'docs.google.com' || hostname.endsWith('.docs.google.com')) {
+              if (title.includes('google docs')) includeSection = true;
+            }
           }
-          if (url.includes('docs.google.com')) {
-            if (title.includes('google docs')) includeSection = true;
-          }
         } else if (processName === 'msedge') {
           // Fallback when URL unknown: include both for msedge
           if (title.includes('google flights')) includeSection = true;
EOF
@@ -214,16 +214,45 @@

// URL-based section selection — most accurate for browser tabs
if (currentUrl) {
const url = currentUrl.toLowerCase();
if (url.includes('google.com/travel/flights') || url.includes('flights.google.com')) {
if (title.includes('google flights')) includeSection = true;
let hostname = '';
let pathname = '';
try {
const parsed = new URL(currentUrl);
hostname = parsed.hostname.toLowerCase();
pathname = parsed.pathname.toLowerCase();
} catch {
// Fallback to original behavior on malformed URLs
const url = currentUrl.toLowerCase();
if (url.includes('google.com/travel/flights') || url.includes('flights.google.com')) {
if (title.includes('google flights')) includeSection = true;
}
if (url.includes('tripadvisor.com')) {
if (title.includes('tripadvisor') || title.includes('google flights')) includeSection = true;
}
if (url.includes('docs.google.com')) {
if (title.includes('google docs')) includeSection = true;
}
}
if (url.includes('tripadvisor.com')) {
if (title.includes('tripadvisor') || title.includes('google flights')) includeSection = true;

if (hostname) {
// Google Flights: google.com/travel/flights or flights.google.com
if (
(hostname === 'google.com' && pathname.startsWith('/travel/flights')) ||
hostname === 'flights.google.com'
) {
if (title.includes('google flights')) includeSection = true;
}

// TripAdvisor: tripadvisor.com or any subdomain
if (hostname === 'tripadvisor.com' || hostname.endsWith('.tripadvisor.com')) {
if (title.includes('tripadvisor') || title.includes('google flights')) includeSection = true;
}

// Google Docs: docs.google.com or any subdomain
if (hostname === 'docs.google.com' || hostname.endsWith('.docs.google.com')) {
if (title.includes('google docs')) includeSection = true;
}
}
if (url.includes('docs.google.com')) {
if (title.includes('google docs')) includeSection = true;
}
} else if (processName === 'msedge') {
// Fallback when URL unknown: include both for msedge
if (title.includes('google flights')) includeSection = true;
Copilot is powered by AI and may make mistakes. Always verify output.
Comment on lines +297 to +308
app.get('/task-logs/current', (_req, res) => {
try {
const logger = (agent as any).logger;
const logPath = logger?.getCurrentLogPath();
if (!logPath || !require('fs').existsSync(logPath)) {
return res.status(404).json({ error: 'No current log' });
}
const content = require('fs').readFileSync(logPath, 'utf-8');
const entries = content.trim().split('\n').map((l: string) => { try { return JSON.parse(l); } catch { return null; } }).filter(Boolean);
res.json(entries);
} catch { res.status(500).json({ error: 'Failed to read log' }); }
});

Check failure

Code scanning / CodeQL

Missing rate limiting High

This route handler performs
a file system access
, but is not rate-limited.
This route handler performs
a file system access
, but is not rate-limited.

Copilot Autofix

AI 1 day ago

In general, the fix is to apply rate limiting middleware to the route handler that performs filesystem access so that a single client (typically identified by IP) cannot flood the endpoint with requests and exhaust server resources. The standard way in an Express app is to use a well-known library such as express-rate-limit, configure sensible limits, and attach the resulting middleware to the specific route (or group of routes) that perform expensive operations.

Concretely, in src/server.ts, we should: (1) import express-rate-limit; (2) create a rate limiter instance, for example allowing a modest number of /task-logs/current requests per IP in a time window (e.g., 30 requests per minute); and (3) attach this limiter middleware to the /task-logs/current route. This preserves existing behavior while adding protections. All changes should be confined to src/server.ts within the shown code: add one import at the top near the other imports, define a taskLogsLimiter (or similar) before routes are declared, and update the /task-logs/current route definition at line 297 to include the limiter as a middleware parameter: app.get('/task-logs/current', taskLogsLimiter, (_req, res) => { ... });.

Suggested changeset 2
src/server.ts

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/src/server.ts b/src/server.ts
--- a/src/server.ts
+++ b/src/server.ts
@@ -22,6 +22,7 @@
 import { join } from 'path';
 import { randomBytes } from 'crypto';
 import { z } from 'zod';
+import rateLimit from 'express-rate-limit';
 import type { ClawdConfig } from './types';
 import { Agent } from './agent';
 import { mountDashboard } from './dashboard';
@@ -294,7 +295,14 @@
     } catch { res.json([]); }
   });
 
-  app.get('/task-logs/current', (_req, res) => {
+  const taskLogsCurrentLimiter = rateLimit({
+    windowMs: 60 * 1000, // 1 minute
+    max: 30, // limit each IP to 30 requests per window
+    standardHeaders: true,
+    legacyHeaders: false,
+  });
+
+  app.get('/task-logs/current', taskLogsCurrentLimiter, (_req, res) => {
     try {
       const logger = (agent as any).logger;
       const logPath = logger?.getCurrentLogPath();
EOF
@@ -22,6 +22,7 @@
import { join } from 'path';
import { randomBytes } from 'crypto';
import { z } from 'zod';
import rateLimit from 'express-rate-limit';
import type { ClawdConfig } from './types';
import { Agent } from './agent';
import { mountDashboard } from './dashboard';
@@ -294,7 +295,14 @@
} catch { res.json([]); }
});

app.get('/task-logs/current', (_req, res) => {
const taskLogsCurrentLimiter = rateLimit({
windowMs: 60 * 1000, // 1 minute
max: 30, // limit each IP to 30 requests per window
standardHeaders: true,
legacyHeaders: false,
});

app.get('/task-logs/current', taskLogsCurrentLimiter, (_req, res) => {
try {
const logger = (agent as any).logger;
const logPath = logger?.getCurrentLogPath();
package.json
Outside changed files

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/package.json b/package.json
--- a/package.json
+++ b/package.json
@@ -26,7 +26,8 @@
     "playwright": "^1.58.2",
     "sharp": "^0.33.0",
     "ws": "^8.16.0",
-    "zod": "^3.25.76"
+    "zod": "^3.25.76",
+    "express-rate-limit": "^8.3.1"
   },
   "devDependencies": {
     "@eslint/js": "^9.39.3",
EOF
@@ -26,7 +26,8 @@
"playwright": "^1.58.2",
"sharp": "^0.33.0",
"ws": "^8.16.0",
"zod": "^3.25.76"
"zod": "^3.25.76",
"express-rate-limit": "^8.3.1"
},
"devDependencies": {
"@eslint/js": "^9.39.3",
This fix introduces these dependencies
Package Version Security advisories
express-rate-limit (npm) 8.3.1 None
Copilot is powered by AI and may make mistakes. Always verify output.
AmrDab added a commit that referenced this pull request Mar 19, 2026
…nt improved

Phase 1 — Vision LLM centralization complete:
- ai-brain.ts: removed 2 hand-rolled methods (callAnthropic + callOpenAICompat),
  unified to callVisionLLMDirect() with streaming support
- a11y-reasoner.ts: replaced 44-line inline fetch with callVisionLLMDirect()
- doctor.ts: replaced 55-line testVisionModel fetch with callVisionLLMDirect()
- ~170 lines of duplicated vision code removed

PR #39 integrated (memory-bloat-and-stale-processes):
- native-desktop.ts: release sharp RGBA buffers after processing (4 sites)
- native-desktop.ts: clear EventEmitter listeners on disconnect
- index.ts: single-instance pidfile lock + SIGINT/SIGTERM teardown
- agent.ts: defensive timeout handle null-check
- server.ts: log message truncation (500 char limit)

SKILL.md updated as MCP instruction manual:
- Quick Start section explaining MCP vs Agent modes
- Troubleshooting section (404, 401, macOS OCR, DPI)
- Platform Notes for macOS (OCR, CDP, Retina scaling)
- Clarified delegate_to_agent requires clawdcursor start

delegate_to_agent error messaging improved:
- ECONNREFUSED → "server not running, run clawdcursor start"
- 404 → "wrong version, need v0.7.0+"
- 401 → "token mismatch, restart server"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants